important information
InfiMed: Low-Resource Medical MLLMs with Advancing Understanding and Reasoning
Liu, Zeyu, Hou, Zhitian, Zhu, Guanghao, Sang, Zhijie, Xie, Congkai, Yang, Hongxia
Multimodal Large Language Models (MLLMs) have achieved remarkable progress in domains such as visual understanding and mathematical reasoning. However, their application in the medical domain is constrained by two key challenges: (1) multimodal medical datasets are scarce and often contain sparse information, limiting reasoning depth; and (2) Reinforcement Learning with Verifiable Rewards (RLVR), though effective in general domains, cannot reliably improve model performance in the medical domain. To overcome these challenges, during the supervised fine-tuning (SFT) stage, we incorporate high-quality textual reasoning data and general multimodal data alongside multimodal medical data to efficiently enhance foundational medical capabilities and restore the base model's reasoning ability. Moreover, considering that there are some multimodal medical datasets with sparse information, we further synthesize reflective-pattern-injected chain-of-thought (CoT) in addition to general CoT samples, equipping the model with initial reflective reasoning capabilities that provide a structured foundation for subsequent RLVR training. Finally, we introduce our InfiMed-Series models, InfiMed-SFT-3B and InfiMed-RL-3B, both of which deliver state-of-the-art performance across seven multimodal medical benchmarks. Notably, InfiMed-RL-3B achieves an average accuracy of 59.2%, outperforming even larger models like InternVL3-8B, which achieves 57.3%. Specifically, during the SFT phase, we utilized 188K samples, while the RLVR phase incorporated 36K samples, demonstrating the efficacy of both training strategies in achieving superior performance. We also conducted a series of extensive experiments, which provide valuable insights that contribute to advancing the performance of MLLMs in medical scenarios.
- North America > United States > California (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Hong Kong (0.04)
- Education (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Paglieri, Davide, Cupiał, Bartłomiej, Cook, Jonathan, Piterbarg, Ulyana, Tuyls, Jens, Grefenstette, Edward, Foerster, Jakob Nicolaus, Parker-Holder, Jack, Rocktäschel, Tim
Training large language models (LLMs) to reason via reinforcement learning (RL) significantly improves their problem-solving capabilities. In agentic settings, existing methods like ReAct prompt LLMs to explicitly plan before every action; however, we demonstrate that always planning is computationally expensive and degrades performance on long-horizon tasks, while never planning further limits performance. To address this, we introduce a conceptual framework formalizing dynamic planning for LLM agents, enabling them to flexibly decide when to allocate test-time compute for planning. We propose a simple two-stage training pipeline: (1) supervised fine-tuning on diverse synthetic data to prime models for dynamic planning, and (2) RL to refine this capability in long-horizon environments. Experiments on the Crafter environment show that dynamic planning agents trained with this approach are more sample-efficient and consistently achieve more complex objectives. Additionally, we demonstrate that these agents can be effectively steered by human-written plans, surpassing their independent capabilities. To our knowledge, this work is the first to explore training LLM agents for dynamic test-time compute allocation in sequential decision-making tasks, paving the way for more efficient, adaptive, and controllable agentic systems.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Man dies after being pulled into MRI machine by metal necklace he was wearing
Ezra founder and CEO Emi Gal explains on'Fox & Friends Weekend' how artificial intelligence can'enhance' MRI scans, image quality, analysis, and comprehension. A man has died after getting sucked into an MRI machine. The accident occurred on July 16 at the Nassau Open MRI in Westbury, New York, according to a press release from the Nassau County Police Department in Long Island. Officers responded to a 911 call at around 4:30 p.m. at the MRI center, which provides diagnostic radiology services. ARE FULL-BODY SCANS WORTH THE MONEY? "Upon arrival, officers were informed that a male, 61, entered an unauthorized Magnetic Resonance Imaging (MRI) room while the scan was in progress," the release stated.
- North America > United States > New York > Nassau County > Westbury (0.26)
- North America > United States > South Carolina > Charleston County > Charleston (0.05)
Resource for Error Analysis in Text Simplification: New Taxonomy and Test Collection
Vendeville, Benjamin, Ermakova, Liana, De Loor, Pierre
The general public often encounters complex texts but does not have the time or expertise to fully understand them, leading to the spread of misinformation. Automatic Text Simplification (ATS) helps make information more accessible, but its evaluation methods have not kept up with advances in text generation, especially with Large Language Models (LLMs). In particular, recent studies have shown that current ATS metrics do not correlate with the presence of errors. Manual inspections have further revealed a variety of errors, underscoring the need for a more nuanced evaluation framework, which is currently lacking. This resource paper addresses this gap by introducing a test collection for detecting and classifying errors in simplified texts. First, we propose a taxonomy of errors, with a formal focus on information distortion. Next, we introduce a parallel dataset of automatically simplified scientific texts. This dataset has been human-annotated with labels based on our proposed taxonomy. Finally, we analyze the quality of the dataset, and we study the performance of existing models to detect and classify errors from that taxonomy. These contributions give researchers the tools to better evaluate errors in ATS, develop more reliable models, and ultimately improve the quality of automatically simplified texts.
- Europe > Italy (0.06)
- Europe > France > Brittany > Finistère > Brest (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (7 more...)
- Health & Medicine (0.95)
- Education (0.68)
DReSS: Data-driven Regularized Structured Streamlining for Large Language Models
Feng, Mingkuan, Wu, Jinyang, Zhang, Shuai, Shao, Pengpeng, Jin, Ruihan, Wen, Zhengqi, Tao, Jianhua, Che, Feihu
Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the potential to reduce model size through pruning techniques. However, existing pruning methods typically follow a prune-then-finetune paradigm. Since the pruned components still contain valuable information, their direct removal often leads to irreversible performance degradation, imposing a substantial computational burden to recover performance during finetuning. In this paper, we propose a novel paradigm that first applies regularization, then prunes, and finally finetunes. Based on this paradigm, we introduce DReSS, a simple and effective Data-driven Regularized Structured Streamlining method for LLMs. By leveraging a small amount of data to regularize the components to be pruned, DReSS explicitly transfers the important information to the remaining parts of the model in advance. Compared to direct pruning, this can reduce the information loss caused by parameter removal, thereby enhancing its language modeling capabilities. Experimental results demonstrate that DReSS significantly outperforms existing pruning methods even under extreme pruning ratios, significantly reducing latency and increasing throughput.
MetaIE: Distilling a Meta Model from LLM for All Kinds of Information Extraction Tasks
Peng, Letian, Wang, Zilong, Yao, Feng, Wang, Zihan, Shang, Jingbo
Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. In this paper, we propose a novel framework MetaIE to build a small LM as meta-model by learning to extract "important information", i.e., the meta-understanding of IE, so that this meta-model can be adapted to all kind of IE tasks effectively and efficiently. Specifically, MetaIE obtains the small LM via a symbolic distillation from an LLM following the label-to-span scheme. We construct the distillation dataset via sampling sentences from language model pre-training datasets (e.g., OpenWebText in our implementation) and prompting an LLM to identify the typed spans of "important information". We evaluate the meta-model under the few-shot adaptation setting. Extensive results on 13 datasets from 6 IE tasks confirm that MetaIE can offer a better starting point for few-shot tuning on IE datasets and outperform other meta-models from (1) vanilla language model pre-training, (2) multi-IE-task pre-training with human annotations, and (3) single-IE-task symbolic distillation from LLM. Moreover, we provide comprehensive analyses of MetaIE, such as the size of the distillation dataset, the meta-model architecture, and the size of the meta-model.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- (17 more...)
Better Explain Transformers by Illuminating Important Information
Song, Linxin, Cui, Yan, Luo, Ao, Lecue, Freddy, Li, Irene
Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highlighting the important information and eliminating irrelevant information by a refined information flow on top of the layer-wise relevance propagation (LRP) method. Specifically, we consider identifying syntactic and positional heads as important attention heads and focus on the relevance obtained from these important heads. Experimental results demonstrate that irrelevant information does distort output attribution scores and then should be masked during explanation computation. Compared to eight baselines on both classification and question-answering datasets, our method consistently outperforms with over 3\% to 33\% improvement on explanation metrics, providing superior explanation performance. Our anonymous code repository is available at: https://github.com/LinxinS97/Mask-LRP
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (10 more...)
Less is More for Long Document Summary Evaluation by LLMs
Wu, Yunshu, Iso, Hayate, Pezeshkpour, Pouya, Bhutani, Nikita, Hruschka, Estevam
Large Language Models (LLMs) have shown promising performance in summary evaluation tasks, yet they face challenges such as high computational costs and the Lost-in-the-Middle problem where important information in the middle of long documents is often overlooked. To address these issues, this paper introduces a novel approach, Extract-then-Evaluate, which involves extracting key sentences from a long source document and then evaluating the summary by prompting LLMs. The results reveal that the proposed method not only significantly reduces evaluation costs but also exhibits a higher correlation with human evaluations. Furthermore, we provide practical recommendations for optimal document length and sentence extraction methods, contributing to the development of cost-effective yet more accurate methods for LLM-based text generation evaluation.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- (7 more...)
Hybrid of DiffStride and Spectral Pooling in Convolutional Neural Networks
Rafif, Sulthan, Pratama, Mochamad Arfan Ravy Wahyu, Azhar, Mohammad Faris, Ibad, Ahmad Mustafidul, Muflikhah, Lailil, Yudistira, Novanto
Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation.
Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users
Jo, Yohan, Zhao, Xinyan, Biswas, Arijit, Basiou, Nikoletta, Auvray, Vincent, Malandrakis, Nikolaos, Metallinou, Angeliki, Potamianos, Alexandros
While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance from MultiWOZ 2.2 was replaced with a small chat between two users that is semantically and pragmatically consistent with the original user utterance, thus resulting in the same dialogue state and system response. These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we propose the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrate that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue systems that are trained for single-user dialogues. Further, this method surpasses training a medium-sized model directly on multi-user dialogues and generalizes to unseen domains.
- North America > United States (0.04)
- North America > Canada > Ontario > Peterborough County > Peterborough (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (5 more...)
- Research Report (1.00)
- Personal > Interview (0.93)
- Consumer Products & Services > Restaurants (0.93)
- Health & Medicine (0.68)